AITopics | marl method

RMIX: LearningRisk-SensitivePoliciesfor CooperativeReinforcementLearningAgents

Neural Information Processing SystemsFeb-19-2026, 08:36:25 GMT

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents incomplexenvironments. Toaddress these issues, we propose RMIX, anovelcooperativeMARL method with theConditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaRfordecentralized execution. Then,tohandle thetemporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictorforriskleveltuning.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL *Pascal Leroy

Neural Information Processing SystemsFeb-16-2026, 09:29:53 GMT

In IMP, a multi-component engineering system is subject to a risk of failure due to its components' damage condition.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Portugal > Braga > Braga (0.04)
Europe > Denmark (0.04)
Europe > Belgium (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (0.46)
Energy > Renewable > Wind (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
(3 more...)

Add feedback

HeterogeneousSkillLearningforMulti-agent Tasks

Neural Information Processing SystemsFeb-12-2026, 18:56:00 GMT

Meanwhile, diverseskill-based policies are generated through a novel skill-based policy learning method. To promote efficient skill discovery, a mutual information based intrinsic reward function is constructed.

agent, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL

Neural Information Processing SystemsDec-26-2025, 12:22:46 GMT

We introduce IMP-MARL, an open-source suite of multi-agent reinforcement learning (MARL) environments for large-scale Infrastructure Management Planning (IMP), offering a platform for benchmarking the scalability of cooperative MARL methods in real-world engineering applications.In IMP, a multi-component engineering system is subject to a risk of failure due to its components' damage condition.Specifically, each agent plans inspections and repairs for a specific system component, aiming to minimise maintenance costs while cooperating to minimise system failure risk.With IMP-MARL, we release several environments including one related to offshore wind structural systems, in an effort to meet today's needs to improve management strategies to support sustainable and reliable energy systems.Supported by IMP practical engineering environments featuring up to 100 agents, we conduct a benchmark campaign, where the scalability and performance of state-of-the-art cooperative MARL methods are compared against expert-based heuristic policies. The results reveal that centralised training with decentralised execution methods scale better with the number of agents than fully centralised or decentralised RL approaches, while also outperforming expert-based heuristic policies in most IMP environments.Based on our findings, we additionally outline remaining cooperation and scalability challenges that future MARL methods should still address.Through IMP-MARL, we encourage the implementation of new environments and the further development of MARL methods.

imp-marl, infrastructure management planning, large-scale infrastructure management planning, (6 more...)

Neural Information Processing Systems

Industry: Energy (0.96)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

a7a7c0c92f195cce85f99768621ac6c0-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 03:59:32 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Portugal > Braga > Braga (0.04)
Europe > Denmark (0.04)
Europe > Belgium (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (0.46)
Energy > Renewable > Wind (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
(3 more...)

Add feedback

c2626d850c80ea07e7511bbae4c76f4b-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 05:33:58 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Oregon (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

594ca7adb3277c51a998252e2d4c906e-Paper.pdf

Neural Information Processing SystemsAug-14-2025, 16:08:49 GMT

agent, objective, sdp system, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.71)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

The Composite Task Challenge for Cooperative Multi-Agent Reinforcement Learning

Li, Yurui, Chen, Yuxuan, Zhang, Li, Li, Shijian, Pan, Gang

arXiv.org Artificial IntelligenceFeb-1-2025

The significant role of division of labor (DOL) in promoting cooperation is widely recognized in real-world applications.Many cooperative multi-agent reinforcement learning (MARL) methods have incorporated the concept of DOL to improve cooperation among agents.However, the tasks used in existing testbeds typically correspond to tasks where DOL is often not a necessary feature for achieving optimal policies.Additionally, the full utilize of DOL concept in MARL methods remains unrealized due to the absence of appropriate tasks.To enhance the generality and applicability of MARL methods in real-world scenarios, there is a necessary to develop tasks that demand multi-agent DOL and cooperation.In this paper, we propose a series of tasks designed to meet these requirements, drawing on real-world rules as the guidance for their design.We guarantee that DOL and cooperation are necessary condition for completing tasks and introduce three factors to expand the diversity of proposed tasks to cover more realistic situations.We evaluate 10 cooperative MARL methods on the proposed tasks.The results indicate that all baselines perform poorly on these tasks.To further validate the solvability of these tasks, we also propose simplified variants of proposed tasks.Experimental results show that baselines are able to handle these simplified variants, providing evidence of the solvability of the proposed tasks.The source files is available at https://github.com/Yurui-Li/CTC.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2502.00345

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.84)

Add feedback

IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL

Neural Information Processing SystemsJan-19-2025, 18:13:50 GMT

We introduce IMP-MARL, an open-source suite of multi-agent reinforcement learning (MARL) environments for large-scale Infrastructure Management Planning (IMP), offering a platform for benchmarking the scalability of cooperative MARL methods in real-world engineering applications.In IMP, a multi-component engineering system is subject to a risk of failure due to its components' damage condition.Specifically, each agent plans inspections and repairs for a specific system component, aiming to minimise maintenance costs while cooperating to minimise system failure risk.With IMP-MARL, we release several environments including one related to offshore wind structural systems, in an effort to meet today's needs to improve management strategies to support sustainable and reliable energy systems.Supported by IMP practical engineering environments featuring up to 100 agents, we conduct a benchmark campaign, where the scalability and performance of state-of-the-art cooperative MARL methods are compared against expert-based heuristic policies. The results reveal that centralised training with decentralised execution methods scale better with the number of agents than fully centralised or decentralised RL approaches, while also outperforming expert-based heuristic policies in most IMP environments.Based on our findings, we additionally outline remaining cooperation and scalability challenges that future MARL methods should still address.Through IMP-MARL, we encourage the implementation of new environments and the further development of MARL methods.

imp-marl, infrastructure management planning, large-scale infrastructure management planning, (4 more...)

Neural Information Processing Systems

Industry: Energy (0.99)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Xu, Linjie, Liu, Zichuan, Dockhorn, Alexander, Perez-Liebana, Diego, Wang, Jinyu, Song, Lei, Bian, Jiang

arXiv.org Artificial IntelligenceApr-15-2024

One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate $3$ MARL methods on $6$ SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://anonymous.4open.science/r/rr_for_MARL-0D83/.

agent, higher rr, sample efficiency, (15 more...)

arXiv.org Artificial Intelligence

2404.09715

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology: